66 research outputs found

    Coreference Resolution System for Indonesian Text with Mention Pair Method and Singleton Exclusion using Convolutional Neural Network

    Full text link
    Neural network has shown promising performance on coreference resolution systems that uses mention pair method. With deep neural network, it can learn hidden and deep relations between two mentions. However, there is no work on coreference resolution for Indonesian text that uses this learning technique. The state-of-the-art system for Indonesian text only states the use of lexical and syntactic features can improve the existing coreference resolution system. In this paper, we propose a new coreference resolution system for Indonesian text with mention pair method that uses deep neural network to learn the relations of the two mentions. In addition to lexical and syntactic features, in order to learn the representation of the mentions words and context, we use word embeddings and feed them to Convolutional Neural Network (CNN). Furthermore, we do singleton exclusion using singleton classifier component to prevent singleton mentions entering any entity clusters at the end. Achieving 67.37% without singleton exclusion, 63.27% with trained singleton classifier, and 75.95% with gold singleton classifier on CoNLL average F1 score, our proposed system outperforms the state-of-the-art system

    Improvement of Fuzzy Geographically Weighted Clustering-Ant Colony Optimization Performance using Context-Based Clustering and CUDA Parallel Programming

    Get PDF
    Geo-demographic analysis (GDA) is the study of population characteristics by geographical area. Fuzzy Geographically Weighted Clustering (FGWC) is an effective algorithm used in GDA. Improvement of FGWC has been done by integrating a metaheuristic algorithm, Ant Colony Optimization (ACO), as a global optimization tool to increase the clustering accuracy in the initial stage of the FGWC algorithm. However, using ACO in FGWC increases the time to run the algorithm compared to the standard FGWC algorithm. In this paper, context-based clustering and CUDA parallel programming are proposed to improve the performance of the improved algorithm (FGWC-ACO). Context-based clustering is a method that focuses on the grouping of data based on certain conditions, while CUDA parallel programming is a method that uses the graphical processing unit (GPU) as a parallel processing tool. The Indonesian Population Census 2010 was used as the experimental dataset. It was shown that the proposed methods were able to improve the performance of FGWC-ACO without reducing the clustering quality of the original method. The clustering quality was evaluated using the clustering validity index

    Ensemble Technique Utilization for Indonesian Dependency Parser

    Get PDF

    Indonesian Named-entity Recognition for 15 Classes Using Ensemble Supervised Learning

    Get PDF
    AbstractHere, we describe our effort in building Indonesian Named Entity Recognition (NER) for newspaper article with 15 classes which is larger number of class type compared to existing Indonesian NER. We employed supervised machine learning in the NER and conducted experiments to find the best attribute combination and the best algorithm with highest accuracy. We compared the attribute of word level, sentence level and document level. In the algorithm, we compared several single machine learning algorithms and also an ensembled one. Using 457 news articles, the best accuracy was achieved by using ensemble technique where the result of several machine learning algorithms were used as the feature for one machine learning algorithm

    EKSTRAKSI KATA KUNCI OTOMATIS UNTUK DOKUMEN BAHASA INDONESIA STUDI KASUS: ARTIKEL JURNAL ILMIAH KOLEKSI PDII LIPI

    Get PDF
    Keyword determination by using controlled vocabulary is not a difficult task for information analysts. However,specify keywords for hundreds or even thousands of articles will take time and effort of the analysts. To ease thework, it needs to be made a system of automatic keyword extraction. The construction of this system passes thestages of preprocessing, translating, and pinpointing keyword candidates with a list of keywords. The research wascarried out by using 33 articles taken from PDII LIPI journal collections. This research employed 3 weighing methods,namely TF, TF x IDF and WIDF. The best result was obtained from TF x IDF method. To refine the result, the authorcarried out fixing the keywords results and using levensthein algorithm.
    • …
    corecore